Goto

Collaborating Authors

 time-accuracy tradeoff


Learning Halfspaces with the Zero-One Loss: Time-Accuracy Tradeoffs

Neural Information Processing Systems

Given \alpha,\epsilon, we study the time complexity required to improperly learn a halfspace with misclassification error rate of at most (1 \alpha)\,L *_\gamma \epsilon, where L *_\gamma is the optimal \gamma -margin error rate. For \alpha 1/\gamma, polynomial time and sample complexity is achievable using the hinge-loss. An immediate question, which this paper tackles, is what is achievable if \alpha \in (0,1/\gamma) . We derive positive results interpolating between the polynomial time for \alpha 1/\gamma and the exponential time for \alpha 0 . In particular, we show that there are cases in which \alpha o(1/\gamma) but the problem is still solvable in polynomial time.


Burn-in, bias, and the rationality of anchoring

Neural Information Processing Systems

Bayesian inference provides a unifying framework for addressing problems in machine learning, artificial intelligence, and robotics, as well as the problems facing the human mind. Unfortunately, exact Bayesian inference is intractable in all but the simplest models. Therefore minds and machines have to approximate Bayesian inference. Approximate inference algorithms can achieve a wide range of time-accuracy tradeoffs, but what is the optimal tradeoff? We investigate time-accuracy tradeoffs using the Metropolis-Hastings algorithm as a metaphor for the mind's inference algorithm(s).


Linear-Time Gromov Wasserstein Distances using Low Rank Couplings and Costs

Scetbon, Meyer, Peyré, Gabriel, Cuturi, Marco

arXiv.org Machine Learning

The ability to compare and align related datasets living in heterogeneous spaces plays an increasingly important role in machine learning. The Gromov-Wasserstein (GW) formalism can help tackle this problem. Its main goal is to seek an assignment (more generally a coupling matrix) that can register points across otherwise incomparable datasets. As a non-convex and quadratic generalization of optimal transport (OT), GW is NP-hard. Yet, heuristics are known to work reasonably well in practice, the state of the art approach being to solve a sequence of nested regularized OT problems. While popular, that heuristic remains too costly to scale, with cubic complexity in the number of samples $n$. We show in this paper how a recent variant of the Sinkhorn algorithm can substantially speed up the resolution of GW. That variant restricts the set of admissible couplings to those admitting a low rank factorization as the product of two sub-couplings. By updating alternatively each sub-coupling, our algorithm computes a stationary point of the problem in quadratic time with respect to the number of samples. When cost matrices have themselves low rank, our algorithm has time complexity $\mathcal{O}(n)$. We demonstrate the efficiency of our method on simulated and real data.


Learning Halfspaces with the Zero-One Loss: Time-Accuracy Tradeoffs

Birnbaum, Aharon, Shwartz, Shai S.

Neural Information Processing Systems

Given $\alpha,\epsilon$, we study the time complexity required to improperly learn a halfspace with misclassification error rate of at most $(1 \alpha)\,L *_\gamma \epsilon$, where $L *_\gamma$ is the optimal $\gamma$-margin error rate. For $\alpha 1/\gamma$, polynomial time and sample complexity is achievable using the hinge-loss. For $\alpha 0$, \cite{ShalevShSr11} showed that $\poly(1/\gamma)$ time is impossible, while learning is possible in time $\exp(\tilde{O}(1/\gamma))$. An immediate question, which this paper tackles, is what is achievable if $\alpha \in (0,1/\gamma)$. We derive positive results interpolating between the polynomial time for $\alpha 1/\gamma$ and the exponential time for $\alpha 0$.


Burn-in, bias, and the rationality of anchoring

Lieder, Falk, Griffiths, Tom, Goodman, Noah

Neural Information Processing Systems

Bayesian inference provides a unifying framework for addressing problems in machine learning, artificial intelligence, and robotics, as well as the problems facing the human mind. Unfortunately, exact Bayesian inference is intractable in all but the simplest models. Therefore minds and machines have to approximate Bayesian inference. Approximate inference algorithms can achieve a wide range of time-accuracy tradeoffs, but what is the optimal tradeoff? We investigate time-accuracy tradeoffs using the Metropolis-Hastings algorithm as a metaphor for the mind's inference algorithm(s). We find that reasonably accurate decisions are possible long before the Markov chain has converged to the posterior distribution, i.e. during the period known as burn-in.